Behavioral considerations suggest an average reward TD model of the dopamine system
نویسندگان
چکیده
Recently there has been much interest in modeling the activity of primate midbrain dopamine neurons as signalling reward prediction error. But since the models are based on temporaldi!erence (TD) learning, they assume an exponential decline with time in the value of delayed reinforcers, an assumption long known to con#ict with animal behavior. We show that a variant of TD learning that tracks variations in the average reward per timestep rather than cumulative discounted reward preserves the models' success at explaining neurophysiological data while signi"cantly increasing their applicability to behavioral data. ( 2000 Published by Elsevier Science B.V. All rights reserved.
منابع مشابه
Dopamine cells respond to predicted events during classical conditioning: evidence for eligibility traces in the reward-learning network.
Behavioral conditioning of cue-reward pairing results in a shift of midbrain dopamine (DA) cell activity from responding to the reward to responding to the predictive cue. However, the precise time course and mechanism underlying this shift remain unclear. Here, we report a combined single-unit recording and temporal difference (TD) modeling approach to this question. The data from recordings i...
متن کاملPVLV: the primary value and learned value Pavlovian learning algorithm.
The authors present their primary value learned value (PVLV) model for understanding the reward-predictive firing properties of dopamine (DA) neurons as an alternative to the temporal-differences (TD) algorithm. PVLV is more directly related to underlying biology and is also more robust to variability in the environment. The primary value (PV) system controls performance and learning during pri...
متن کاملStimulus Representation and the Timing of Reward-Prediction Errors in Models of the Dopamine System
The phasic firing of dopamine neurons has been theorized to encode a reward-prediction error as formalized by the temporal-difference (TD) algorithm in reinforcement learning. Most TD models of dopamine have assumed a stimulus representation, known as the complete serial compound, in which each moment in a trial is distinctly represented. We introduce a more realistic temporal stimulus represen...
متن کاملTD models of reward predictive responses in dopamine neurons
This article focuses on recent modeling studies of dopamine neuron activity and their influence on behavior. Activity of midbrain dopamine neurons is phasically increased by stimuli that increase the animal's reward expectation and is decreased below baseline levels when the reward fails to occur. These characteristics resemble the reward prediction error signal of the temporal difference (TD) ...
متن کاملDopamine in Reward Learning and Neuropsychiatric Disorders Context and Salience: the Role of Dopamine in Reward Learning and Neuropsychiatric Disorders
iii Abstract Evidence suggests that a change in the firing rate of dopamine (DA) cells is a major neurobiological correlate of learning. The Temporal Difference (TD) learning algorithm provides a popular account of the DA signal as conveying the error between expected and actual rewards. Other accounts have attempted to code the DA firing pattern as conveying surprise or salience. The DA mediat...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Neurocomputing
دوره 32-33 شماره
صفحات -
تاریخ انتشار 2000